The aim of this project was to evaluate whether or not geotagged social media data can be useful in providing insight into a region’s “Sense of Place” using Santa Barbara as a case study. Sense of Place can be defined as the connection people feel to their geographic surroundings, including both the natural and built environment. Locations with a strong sense of place often have a strong identity felt by both locals and visitors.
Daily geotagged twitter data from 2015 - 2019 was analyzed to spatial patterns of tourists and locals, and to understand how these two user groups engage or perceive the natural environment of Santa Barbara.
Twitter data was obtained freely through a partnership between UCSB Library and Crimson Hexagon. Before downloading, the data was queried to meet the following conditions:
Acessing Data
Crimson Hexagon only allows 10,000 randomly selected tweets to be exported, manually, at a time in .xls format. Due to this restriction, data was manually downloaded for every 2 days in order to capture all tweets. There were around 5000 average number of daily tweets that met these conditions.
The Crimson Hexagon data did not contain all desired information, including whether or not the tweet was geotagged. To get this information we used the python twarc library to “rehydrate” the data using individual tweet ids and store the tweet information as .json files. From here we were able to remove all tweets that did not have a geotag, giving us a total of 79,981 tweets.
| Month | Day | Time | Year | full_text | user_location | retweet_count | favorite_count | month_num | date |
|---|---|---|---|---|---|---|---|---|---|
| Dec | 8 | 17:24:44 | 2018 | Just posted a photo @ Santa Barbara, California https://t.co/y6sQXmeJ3h | New York, USA | 0 | 0 | 12 | 2018-12-08 |
| Jan | 28 | 02:05:43 | 2017 | I’m at Draughtsmen Aleworks in Goleta, Calif https://t.co/OUIyKihU9K | Santa Barbara, CA | 0 | 0 | 1 | 2017-01-28 |
| Feb | 14 | 19:18:20 | 2018 | Safety first on this Valentine’s Day! #twodozen #roses🌹 @ Santa Barbara, California https://t.co/FjFxaapHzX | California | 0 | 0 | 2 | 2018-02-14 |
| Feb | 27 | 01:45:10 | 2017 | The Phoenix reunited (and Tucker) Congratulations mrzaccampbell and… https://t.co/leNiSuwRH9 | LA | 0 | 0 | 2 | 2017-02-27 |
| Dec | 19 | 02:28:37 | 2015 | Merry making. #cheers (@ Nectar Eatery & Lounge in Santa Barbara, CA) https://t.co/foQ9VDeYha https://t.co/6eYHjinqai | ➳ Los Angeles | 0 | 0 | 12 | 2015-12-19 |
| Feb | 9 | 07:34:06 | 2016 | Detail of the Santa Barbara Courthouse. Spanish revival/Spanish Moderne Masterpiece… https://t.co/rFzW9FhDna | Los Angeles | 0 | 0 | 2 | 2016-02-09 |
| Jul | 9 | 00:01:11 | 2016 | I turned around and saw these and just like my son said, “ooh ooooh ooh.” #guitarbar #funkzone… https://t.co/DvgrYHu5V5 | Austin, Texas | 0 | 0 | 7 | 2016-07-09 |
The spatial distribution of tweets highlights areas of higher population density and tourist areas in downtown Santa Barbara.
There is a single coordinate that has over 11,000 tweets reported across all years. It is near De La Vina between Islay and Valerio. There is nothing remarkable about this site so I assume it is the default coordinate when people tag “Santa Barbara” generally. The coordinate is 34.4258, -119.714.
As you zoom in on the map, clusters will disaggregate. You can click on blue points to see the tweet.
Each hexagon shows the log10 density of tweets in that area. The highest number of tweets in a single location is around 11,000 (yellow hex). This includes the default Santa Barbara coordinate used for geotagging from the city of Santa Barbara without a precise location
The number of geotagged tweets is going down over time. There is a significant drop in tweets at the end of April, 2015. It seems this is due “a change in Twitter’s ‘post Tweet’ user-interface design results in fewer Tweets being geo-tagged” ( source). The first 4 months of 2015 have 15,720 tweets, or roughly 19% of all tweets. To reduce a skew in the data and remove geotagged tweets that may have been geotagged without knowledge by the user in those months, we moved forward with all tweets from May 1, 2015 through the end of 2019.
Given that the tweet dataset is queried to just those that are geotagged - I hypothesize that most of these tweets have a picture or a link to an instagram post. We can detect links by looking for “t.co” in the tweet which is a twitter URL for a separate webpage. These are often twitter or instagram photos but we can’t be 100% certain.
It looks like 93% of geotagged tweets contain a link or picture.
This project aims to understand if and how preferences differ between tourists and locals for nature-based places within the Santa Barbara area. In order to test this we needed to come up with a way to identify tourists or locals. We used a two step process.
First, if the user has self-identified their location as somewhere in the Santa Barbara area, they are designated a local. This includes Carpinteria, Santa Barbara, Montecito, Goleta, Gaviota and UCSB. For the remainder, we use the number of times they have tweeted from Santa Barbara within a year to designate user type. If someone has tweeted across more than 2 months in the same year from Santa Barbara, they are identified as a local. This is consistent with how Eric Fischer determined tourists in his work. This is not fool-proof and there are instances were people visit and tweet from Santa Barbara more than two months a year, especially if they are visiting family or live within a couple hours driving distance.
There are 21811 tweets from tourists and 45420 tweets from locals (32% and 68%). There are 12460 unique tourists and just 1893 unique local users.
The following map shows areas that have more tweets from locals (orange) or tourists (purple). Note the values indicate the log10 of the absolute difference between number of tweets from each user group. If a hex is purple and has a value of 2, this means there are 100 times more tweets from tourists than locals at that location.
The full text of each tweet was analyzed to be either nature-based or not. We developed a coarse dictionary of words that indicate a nature-based tweet. These include natural features like ocean, coast, park, and works that indicate recreating (fishing, hiking, camping, etc.).
Note: I had a hard time finding an ontology or lexicon that would fit this project. These are definitely skewed more towards nature and recreation rather than words like “home” or “connection”.
## [1] "hike" "trail" "hiking" "camping" "tent"
## [6] "climb" "summit" "fishing" "sail" "sailing"
## [11] "boat" "boating" "ship" "cruise" "cruising"
## [16] "bike" "biking" "dive" "diving" "surf"
## [21] "surfing" "paddle" "swim" "ocean" "beach"
## [26] "[^a-z]sea" "sand" "coast" "island" "wave"
## [31] "fish" "whale" "dolphin" "pacific" "crab"
## [36] "lobster" "water" "shore" "marine" "seawater"
## [41] "lagoon" "slough" "saltwater" "underwater" "tide"
## [46] "aquatic" "[^a-z]tree" "[^a-z]earth" "weather" "sunset"
## [51] "sunrise" "[^a-z]sun" "climate" "park" "wildlife"
## [56] "[^a-z]view" "habitat" "[^a-z]rock" "nature" "mountains"
## [61] "[^a-z]peak" "canyon" "pier" "wharf" "environment"
## [66] "ecosystem" "flower"
Let’s look at some examples of what tweets qualified as “nature-based”.
| date | full_text | user_location | user_type | nature_word |
|---|---|---|---|---|
| 2018-10-06 | ⛺️ 🍳 ⛺️ 🍲 ⛺️ 🐟 ⛺️ 🥘 are you in santa barbara this weekend? get thee to the rei sunday evening 6:30-8pm to see fellow gourmet girls go camping cookbook author topressandbeyond doing a camp… https://t.co/rxemvb2akt | California, USA | tourist | 1 |
| 2017-02-19 | anyone else enjoying the swell this weekend? #westcoastlife @ california coastline https://t.co/nvyprwfowp | Huntington Beach, CA | tourist | 1 |
| 2016-03-20 | i’m on a boat. #ifeelgross #whalewatching #whalemigration #🐳 @ santa barbara, california https://t.co/i5dnd6y7yw | Southern California | tourist | 1 |
| 2016-04-17 | i’m at goleta beach county park in goleta, ca https://t.co/rc6r5iw1cx | NA | tourist | 1 |
| 2015-07-23 | day break over #fsbellavista (📷 riichelly) #fourseasons @ four seasons resort the biltmore santa… https://t.co/n3bryxhi6z | Santa Barbara, California | local | 1 |
| 2015-11-30 | surfland, pictures in reverse and upside down, that is till you turn the plate right side up. except… https://t.co/8jcid9bmzw | Brooklyn, NY USA | local | 1 |
| 2016-12-17 | dog walk just before sunset in #santabarbara. #travel @ san antonio creek trail goleta https://t.co/xec7miwxtb | Santa Barbara | local | 1 |
All groups show increases in proportion of tweets that are nature based over time.
After identifying nature-based tweets we can take a look at where these tweets are coming from and compare to the general pattern of tweets.
Not surprisingly there are less nature-based tweets than non-nature-based 24% of all geo-tagged tweets are nature-based.
Of local tweeters, 21% of tweets are nature-based. Of tourists, 30% are nature-based.
California Protected Areas Database
We can use the CPAD data to identify protected areas. [expandon CPAD here]
We can look at the top 20 most popular tweeted-from sites. The green highlighted portion represents nature-based tweets. The number indicates what percentage of all tweets are nature-based at each site. Names in bold indicate over 50% of tweets are nature-based.
Looking at the breakdown between tourists and locals.
At the lower end we see more locals than tourists visiting these sites. These tend to be less popular areas. On the upper end, we see sites that are more frequented overall, and more frequented by tourists. These include well-known areas like the Santa Barbara Harbor and Stearn’s Wharf. Those on the lower end that locals frequent more are either lesser-known (Shoreline Park, Alameda Park are both neighborhood parks), or further from main tourist areas (e.g. Goleta Beach)
We only have 3846 unique users from within CPAD areas. This is 27% of all unique users in the dataset.
We can apply a sentiment analysis to the twitter data to try and understand patterns and trends in the general sentiment of tweets.
The top graph shows the total number of geotagged tweets, which has gone down over time across tourists and locals.
The bottom graph shows average daily sentiment scores over time. Above 0 is positive, below 0 is negative. We see that tweets are mostly positive and growing over time.
Data is harder to find
Assign each designated area to a location like coastal, urban, foothill, mountain and see if we see interesting trends across those. Expect - more tweets in coastal, urban. Maybe coastal and mountains have more nature-based vs urban and foothill.
There might be an interesting comparison between rural-suburban-urban areas. We hypothseize that the tourist/local alignment would split in urban areas, maybe aligned in suburban (like SB) and maybe not exist in rural.
Proportion of words that are nature based tells you how people. In Santa Barbara, there will be a lot of nature-based sense of place. In Manhattan, we wouldn’t expect to see nature based ones so much.
In a blog piece we can pose questions that we couldn’t answer but stuff like “can proportion of tourists/locals in place engagement tell us anything”.
Could compare % nature based tweets in SB to other areas. If we did this across the whole state, what proportion% are nature based? Maybe on average its just 5%.
Where and why do locals and tourists overlap in their use of area. SB seems to have a high alignment of tourists/locals, which may be helpful for local policy. Maybe places with distinct differences in how tourists/locals use places.
Look at cities of different coastal sizes rural - small town - urban - mega city. Could see how tourists/locals patterns differentiate across scale.
Is there a threshold of tourists where locals don’t go anymore?
In areas where we see both tourists and locals engaging, what characteristics do we see?
Quantifying transitions between rural to city.